Search CORE

CiteSeerX

Bulgarian Digital Mathematics Library at IMI-BAS

N–Dimensional Orthogonal Tile Sizing Problem

Author: Andonov Rumen
Yanev Nicola
Publication venue: Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Publication date: 01/01/1998
Field of study

AMS subject classification: 68Q22, 90C90We discuss in this paper the problem of generating highly efficient code when a n + 1-dimensional nested loop program is executed on a n-dimensional torus/grid of distributed-memory general-purpose machines. We focus on a class of uniform recurrences with non-negative components of the dependency matrix. Using tiling the iteration space strategy we show that minimizing the total running time reduces to solving a non-trivial non-linear integer optimization problem. For the later we present a mathematical framework that enables us to derive an O(n log n) algorithm for finding a good approximate solution. The theoretical evaluations and the experimental results show that the obtained solution approximates the original minimum sufficiently well in the context of the considered problem. Such algorithm is realtime usable for very large values of n and can be used as optimization techniques in parallelizing compilers as well as in performance tuning of parallel codes by hand

Solving Maximum Clique Problem for Protein Structure Similarity

Author: Andonov Rumen
Malod-Dognin Noël
Yanev Nicola
Publication venue
Publication date: 01/01/2009
Field of study

A basic assumption of molecular biology is that proteins sharing close three-dimensional (3D) structures are likely to share a common function and in most cases derive from a same ancestor. Computing the similarity between two protein structures is therefore a crucial task and has been extensively investigated. Evaluating the similarity of two proteins can be done by finding an optimal one-to-one matching between their components, which is equivalent to identifying a maximum weighted clique in a specific "alignment graph". In this paper we present a new integer programming formulation for solving such clique problems. The model has been implemented using the ILOG CPLEX Callable Library. In addition, we designed a dedicated branch and bound algorithm for solving the maximum cardinality clique problem. Both approaches have been integrated in VAST (Vector Alignment Search Tool) - a software for aligning protein 3D structures largely used in NCBI (National Center for Biotechnology Information). The original VAST clique solver uses the well known Bron and Kerbosh algorithm (BK). Our computational results on real life protein alignment instances show that our branch and bound algorithm is up to 116 times faster than BK for the largest proteins

arXiv.org e-Print Archive

Bulgarian Digital Mathematics Library at IMI-BAS

Integer Programming Approach for Nested Pairs Genome Scaffolding

Author: Andonov Rumen
Epain Victor
Publication venue: HAL CCSD
Publication date: 18/03/2022
Field of study

Scaffolding step in the genome assembly aims to determine the order and the orientation of a huge number of previously assembled genomic fractions (contigs/scaffolds). Here we introduce a particular case of this problem and denote it by Nested Pairs Scaffolding. We formulate it as an optimisation problem and propose an integer programming formulation for its resolution. The performed computational results on real and synthetic data show an excellent behaviour of our formulation

Graphes de Chevauchements à destination d'Algorithmes d'Assemblage et de Scaffolding : Retours sur les Paradigmes et Propositions d'Implémentations

Author: Andonov Rumen
Epain Victor
Publication venue: HAL CCSD
Publication date: 14/10/2022
Field of study

Assembling DNA fragments based on their overlaps remains the main assembly paradigm with long DNA fragments sequencing technologies, independently of the aim to resolve only one or several haplotypes. Since an overlap can be seen as a succession relationship between two oriented fragments, the directed graph structure has emerged as an appropriate data structure for handling overlaps. However, this graph paradigm does not appear to take benefit of the reverse symmetry of the orientated fragments and their overlaps, which is a result of blind DNA double-strand sequencing. Thus, the bi-directed graph paradigm was introduced in 1995 towards reducing the graph size by handling the reverse symmetry, and becomes since then the main graph paradigm used in assembly/scaffolding methods. Nevertheless, the available graph paradigms have never been contrasted before, and no implementations have been described. Here we make a complete review on the existing overlap graph paradigms. Furthermore, we present suitable data structures that are theoretically compared in terms of time and memory consumption in the context of the design of some basic graph algorithms. We also show that each one of the paradigms can be switched to another by slightly modifying their data structures

arXiv.org e-Print Archive

Lagrangian Approaches for a class of Matching Problems in Computational Biology

Author: Andonov Rumen
Balev Stefan
Veber Philippe
Yanev Nicola
Publication venue
Publication date: 01/01/2006
Field of study

This paper presents efficient algorithms for solving the problem of aligning a protein structure template to a query amino-acid sequence, known as protein threading problem. We consider the problem as a special case of graph matching problem. We give formal graph and integer programming models of the problem. After studying the properties of these models, we propose two kinds of Lagrangian relaxation for solving them. We present experimental results on real life instances showing the efficiency of our approaches

HAL - Normandie Université

Flexible Alignments for Protein Threading

Author: Andonov Rumen
Collet Guillaume
Gibrat Jean-François
Yanev Nicola
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

We present a new local alignment method for the protein threading problem. Local sequence-sequence alignments are widely used to find functionally important regions in families of proteins. However, to the best of our knowledge, no local sequence-structure alignment algorithm has been described in the literature. Here we model local alignments as Mixed Integer Programming (MIP) models. These models permit to align a part of a protein structure onto a protein sequence in order to detect local similarities. The paper describes two MIP models, compares and analyzes their performance by using ILOG CPLEX 10 solver

HAL Descartes

Scaffolding Optimal pour les Régions Répétées Inverses-Complémentaires de Génome de Chloroplastes

Author: Andonov Rumen
Epain Victor
Lavenier Dominique
Publication venue: HAL CCSD
Publication date: 05/07/2022
Field of study

International audienceScaffolding step in the genome assembly aims to determine the order and the orientation of a huge number of previously assembled genomic fractions (contigs/scaffolds). Here we introduce a particular case of this problem and denote it by Nested Inverted Fragments Scaffolding (NIFS). We formulate it as an optimisation problem in a particular kind of directed graph that we call Multiplied Doubled Contigs Graph (MDCG). Furthermore, we prove that the NIFS problem is NP-Hard. We also discuss how the chloroplast data have been generated by filtering the reads sequenced both from plants and chloroplasts. Moreover, we propose a graph structure to visualise the solution and to highlight the particularity of chloroplast's regions structure

Optimal de novo assemblies for chloroplast genomes based on inverted repeats patterns

Author: Andonov Rumen
Epain Victor
Lavenier Dominique
Publication venue: HAL CCSD
Publication date: 12/07/2021
Field of study

International audienceBackground Chloroplast genome assembly remains challenging because sequencing step outputs short reads both from plant and plastid genomes. Some recent dedicated assemblers [1,2] use the information of a highly conserved circular and quadripartite structure with a pair of dispersed inverted repeat regions in chloroplast genomes. Materials and methods We designed a dedicated pattern-driven de novo assembler which requires short unpaired reads uniquely (distances provided by paired-reads are not needed), sequenced from both the plant and its chloroplasts. A first step consists in separating the chloroplasts reads from the reads specific to plant. To this end we use the observation that the chloroplast genomes are over-represented compared to the plant genome. Then we compute an estimated coverage of the pre-assembled contigs and we keep the ones with higher coverage. The first step outputs an assembly graph where each vertex corresponds to a contig and is provided with an estimated multiplicity number. In the sequel we use another graph where each vertex is duplicated according to its multiplicity number and to the two possible contig orientations. The edges are duplicated respectively. In our approach the genome assembly is modelled as finding an elementary path in this graph. We formulate the dispersed repeats as linear constraints and we search for an elementary path using Integer Linear Programming similarly to [3]. In our approach inverted repeats correspond to occurrences of contigs paired with other occurrences of them but in reverse orientation. Their positions on the assembled sequence must satisfy nested-pairs pattern. We formulate the above constraints in terms of linear program where the objective is to maximize the nested-pairs number. Thus, we generalize a similar approach applied for RNA folding [4]. Indeed, in contrast to the later approach where the vertices correspond to bases with known sequence indices, in our case the positions of the contigs are variables. Our tool is implemented with Python 3 and uses the open-source PuLP package which integrates a free solver to solve the above optimization problem. Results We tested our program with QUAST [5] and we obtained very encouraging preliminary results, with high genome coverage (mostly >99%), and very low mismatches and indels rates. Conclusions We designed a chloroplast genome dedicated pattern-driven de novo assembler using only short unpaired reads. We formulate the conserved circular and quadripartite structure as linear constraints and implemented this model in an open-source program. Finally, QUAST evaluation returned some encouraging preliminary results